This vignette assumes a SQL server at localhost (we use PostgreSQL), with patient data in OMOP Common Data Model v5.4 format in schema cdm_new_york3. The data shown in this example is synthetic, generated by Synthea(TM) Patient Generator.

library(phea)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Connect to SQL server.
dbcon <- DBI::dbConnect(RPostgres::Postgres(),
  host = 'localhost', port = 7654, dbname = 'fort',
  user = cred$pg$user, password = cred$pg$pass)

# Call setup_phea so we can use sqlt() and sql0().
setup_phea(dbcon, 'cdm_new_york3')

How to find patients who had an acute myocardial infarction (AMI) when they were clinically obese (body mass index, BMI > 30)?

A patient can have multiple myocardial infarctions, and their weight can change over time. In this example we are only interested in cases where the AMI happened while the patient’s BMI was known to be > 30.

The problem is, we cannot go back in time and measure the BMI of the patient on the day the AMI happened. Maybe someone did measure the patient’s weight on that day, but maybe not. So, what do we mean when we say “known to be obese”?

Maybe the only weight measurement available for that person is from 3 years before, or after, the day of the AMI. Or maybe the AMI happened between two weight measurements, one giving BMI > 30 hence obesity, the other giving BMI < 30 hence no obesity. Which one do you pick?

Here, we will pick whichever body measurements are closest in time to each AMI event.

Say a patient:

…that means that on day 40, the closest weight record available was the one from day 1. That record said that the patient was obese. So, we count that as an AMI that happened in a patient known to be obese.

That would not be the case had the AMI been on day 90, for example. Similarly, if that same patient had a second AMI (god forbid) on day 120, or any day after 60, that second AMI would count in the non-obese group.

Create components

Let us first compute body mass index like in another vignette.

# Weight component
# Loinc 29463-7 Body weight, OMOP concept ID 3025315
weight_component <- sqlt(measurement) |>
  filter(measurement_concept_id == 3025315) |>
  make_component(
    .ts = measurement_datetime,
    .pid = person_id)

# Height component
# Loinc 8302-2 Body height, OMOP concept ID 3036277
height_component <- sqlt(measurement) |>
  filter(measurement_concept_id == 3036277) |>
  make_component(
    .ts = measurement_datetime,
    .pid = person_id)

To which we add a component for acute myocardial infarction (AMI), from table CONDITION_OCCURRENCE.

# Acute myocardial infarction (AMI) component
# SNOMED 22298006 Myocardial infarction, OMOP concept ID 4329847
ami_component <- sqlt(condition_occurrence) |>
  filter(condition_concept_id == 4329847) |>
  make_component(
    .ts = condition_start_datetime,
    .pid = person_id)

Calculate the phenotype

Spelling out the BMI formula is straightforward:

bmi = height/(weight * weight)

From which we can write a formula for “is obese”:

is_obese = bmi > 30

Notice that while the first formula is numerical (gives a number), the second one is logical (gives a TRUE or FALSE). Both are supported transparently, as long as they’re not mixed, e.g. 3 / TRUE.

However, doing the same for “has an AMI” may not be as clear. What is the formula for “patient has an AMI event”?

Since we have a column that gives the date when the AMI happened (it’s called condition_start_datetime), we can check whether that column is empty. In each patient, for all phenotype calculations with a timestamp past the timestamp of an AMI event, that column will be populated by Phea with the date of that AMI event. For timestamps prior to the patient’s first AMI, the column is empty. If the patient has a new AMI, the date is updated.

To check if a column is empty in SQL, we use is not null:

has_ami = ami_condition_start_datetime is not null

Armed with this, we can write the final formula for “has AMI and is obese”:

obese_ami = is_obese AND has_ami

# AMI in obese patient
ami_obese <- calculate_formula(
  components = list(
    weight = weight_component,
    height = height_component,
    ami = ami_component),
  fml = list(
    
    height_in_meters = 'height_value_as_number / 100',
    
    bmi = 'weight_value_as_number / (height_in_meters * height_in_meters)',
    
    has_bmi = 'bmi is not null',
    
    is_obese = 'bmi > 30',
    
    has_ami = 'ami_condition_start_datetime is not null',
    
    ami_obese = 'has_ami and is_obese'
    ))
# Keep only patients who had AMI
ami_patients <- ami_obese |>
  filter(has_ami) |>
  select(pid) |>
  distinct() |>
  pull()

ami_obese <- ami_obese |>
  filter(pid %in% local(ami_patients))

head_shot(ami_obese) |>
  kable()
row_id pid ts window height_value_as_number weight_value_as_number ami_condition_start_datetime height_in_meters bmi has_bmi is_obese has_ami ami_obese
8060 196 1988-09-10 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8081 196 1989-09-16 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8082 196 1990-09-22 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8083 196 1991-09-28 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8084 196 1992-10-03 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8065 196 1993-10-09 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8066 196 1994-10-15 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8067 196 1995-10-21 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8068 196 1996-10-26 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE
8069 196 1997-11-01 00:00:00 186 94.4 NA 1.86 27.28639 TRUE FALSE FALSE FALSE

Keep only the records closest to each other

Phea will compute ami_obese for all points in time in each patient. If a patient has a single AMI event, but had his body weight and height measured on 6 different dates, that is 7 computations of ami_obese, unless the date of the AMI coincides with one of the dates of the body measurements.

Since we want to consider only the body measurements made closest in time to the AMI event, we can filter the rows by the window.

Phea returns the window alongside each phenotype calculation. The window is the time distance between the oldest record, and the most recent one, that went into the phenotype calculation.

Therefore, in each patient, for each AMI event, whichever row has the smallest window, that row is using the body measurements that are closest possible, in time, to the AMI event.

# Keep only the smallest window for each patient:
ami_obese_f <- ami_obese |>
  filter(has_ami & has_bmi) |>
  keep_row_by('window', c('pid', 'ami_condition_start_datetime'))

head_shot(ami_obese_f) |>
  kable()
row_id pid ts window height_value_as_number weight_value_as_number ami_condition_start_datetime height_in_meters bmi has_bmi is_obese has_ami ami_obese
8079 196 2007-12-22 364 days 186.0 94.4 2007-12-22 1.860 27.28639 TRUE FALSE TRUE FALSE
8361 204 1975-04-20 14 days 176.1 93.2 1975-04-20 1.761 30.05365 TRUE TRUE TRUE TRUE
10839 272 1990-12-30 105 days 182.8 91.2 1990-12-30 1.828 27.29245 TRUE FALSE TRUE FALSE
11376 286 1987-11-04 189 days 168.9 53.3 1987-11-04 1.689 18.68392 TRUE FALSE TRUE FALSE
11757 295 2018-07-08 154 days 181.0 91.8 2018-07-08 1.810 28.02112 TRUE FALSE TRUE FALSE
12765 321 2009-01-28 112 days 191.4 110.4 2009-01-28 1.914 30.13597 TRUE TRUE TRUE TRUE
13067 328 1989-09-28 98 days 162.4 55.3 1989-09-28 1.624 20.96781 TRUE FALSE TRUE FALSE
15489 379 2005-09-09 42 days 182.8 92.1 2005-09-09 1.828 27.56178 TRUE FALSE TRUE FALSE
15915 389 2012-03-03 147 days 169.1 79.8 2012-03-03 1.691 27.90716 TRUE FALSE TRUE FALSE
18513 452 1932-09-15 160 days 187.3 75.5 1932-09-15 1.873 21.52144 TRUE FALSE TRUE FALSE

Plot the phenotype for a random patient

ami_obese_patients <- ami_obese |>
  filter(ami_obese) |>
  select(pid) |>
  distinct() |>
  pull()

random_patient <- sample(ami_obese_patients, 1)
message('Sampled patient: ', random_patient)
#> Sampled patient: 21720
ami_obese |>
  select(
    -height_value_as_number,
    -ami_condition_start_datetime) |>
  phea_plot(pid = random_patient)
#> Collecting lazy table, done.

Obtain the SQL query that computes the phenotype

To see the SQL query underlying the phenotype, use helper function code_shot(), or dbplyr::sql_render(), or the .clip_sql option in calculate_formula().

code_shot(ami_obese)
SELECT *
FROM (
SELECT *, has_ami and is_obese AS "ami_obese"
FROM (
  SELECT *, ami_condition_start_datetime is not null AS "has_ami"
  FROM (
    SELECT *, bmi > 30 AS "is_obese"
    FROM (
      SELECT *, bmi is not null AS "has_bmi"
      FROM (
        SELECT
          *,
          weight_value_as_number / (height_in_meters * height_in_meters) AS "bmi"
        FROM (
          SELECT
            "row_id",
            "pid",
            "ts",
            "window",
            "height_value_as_number",
            "weight_value_as_number",
            "ami_condition_start_datetime",
            height_value_as_number / 100 AS "height_in_meters"
          FROM (
            SELECT
              *,
              "ts" - least(weight_ts, height_ts, ami_ts) AS "window",
              last_value(row_id) over (partition by "pid", "ts") AS "ts_row"
            FROM (
              SELECT
                "row_id",
                "pid",
                "ts",
                MAX("weight_value_as_number") OVER (PARTITION BY "pid", "..dbplyr_partion_1") AS "weight_value_as_number",
                MAX("weight_ts") OVER (PARTITION BY "pid", "..dbplyr_partion_2") AS "weight_ts",
                MAX("height_value_as_number") OVER (PARTITION BY "pid", "..dbplyr_partion_3") AS "height_value_as_number",
                MAX("height_ts") OVER (PARTITION BY "pid", "..dbplyr_partion_4") AS "height_ts",
                MAX("ami_condition_start_datetime") OVER (PARTITION BY "pid", "..dbplyr_partion_5") AS "ami_condition_start_datetime",
                MAX("ami_ts") OVER (PARTITION BY "pid", "..dbplyr_partion_6") AS "ami_ts"
              FROM (
                SELECT
                  *,
                  SUM(CASE WHEN (("weight_value_as_number" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_1",
                  SUM(CASE WHEN (("weight_ts" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_2",
                  SUM(CASE WHEN (("height_value_as_number" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_3",
                  SUM(CASE WHEN (("height_ts" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_4",
                  SUM(CASE WHEN (("ami_condition_start_datetime" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_5",
                  SUM(CASE WHEN (("ami_ts" IS NULL)) THEN 0 ELSE 1 END) OVER (PARTITION BY "pid" ORDER BY "pid", "ts" ROWS UNBOUNDED PRECEDING) AS "..dbplyr_partion_6"
                FROM (
                  SELECT
                    row_number() over () AS "row_id",
                    "pid",
                    "ts",
                    last_value(case when "name" = 'ouzkytg7046r' then "value_as_number" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "weight_value_as_number",
                    last_value(case when "name" = 'ouzkytg7046r' then "ts" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "weight_ts",
                    last_value(case when "name" = 'hvj925mu1sze' then "value_as_number" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "height_value_as_number",
                    last_value(case when "name" = 'hvj925mu1sze' then "ts" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "height_ts",
                    last_value(case when "name" = 'n8t9unwm71hv' then "condition_start_datetime" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "ami_condition_start_datetime",
                    last_value(case when "name" = 'n8t9unwm71hv' then "ts" else null end) over (partition by "pid", "name" order by "ts" rows between unbounded preceding and 0 preceding) AS "ami_ts"
                  FROM (
                    (
                      SELECT *, NULL AS "condition_start_datetime"
                      FROM (
                        (
                          SELECT
                            'ouzkytg7046r' AS "name",
                            "person_id" AS "pid",
                            "measurement_datetime" AS "ts",
                            "value_as_number"
                          FROM "cdm_new_york3"."measurement"
                          WHERE ("measurement_concept_id" = 3025315.0)
                        )
                        UNION ALL
                        (
                          SELECT
                            'hvj925mu1sze' AS "name",
                            "person_id" AS "pid",
                            "measurement_datetime" AS "ts",
                            "value_as_number"
                          FROM "cdm_new_york3"."measurement"
                          WHERE ("measurement_concept_id" = 3036277.0)
                        )
                      ) "q01"
                    )
                    UNION ALL
                    (
                      SELECT
                        "name",
                        "pid",
                        "ts",
                        NULL AS "value_as_number",
                        "condition_start_datetime"
                      FROM (
                        SELECT
                          'n8t9unwm71hv' AS "name",
                          "person_id" AS "pid",
                          "condition_start_datetime" AS "ts",
                          "condition_start_datetime"
                        FROM "cdm_new_york3"."condition_occurrence"
                        WHERE ("condition_concept_id" = 4329847.0)
                      ) "q02"
                    )
                  ) "q03"
                ) "q04"
              ) "q05"
            ) "q06"
          ) "q07"
          WHERE ("row_id" = "ts_row")
        ) "q08"
      ) "q09"
    ) "q10"
  ) "q11"
) "q12"
) "q01"
WHERE ("pid" IN (196, 204, 272, 286, 295, 321, 328, 379, 389, 452, 531, 555, 593, 602, 691, 771, 802, 928, 1055, 1059, 1062, 1232, 1327, 1328, 1463, 1564, 1597, 1678, 1696, 1749, 1926, 1964, 1972, 2082, 2186, 2205, 2213, 2249, 2374, 2488, 2512, 2553, 2599, 2698, 2707, 2714, 3090, 3199, 3210, 3217, 3257, 3316, 3327, 3351, 3359, 3370, 3377, 3406, 3419, 3426, 3450, 3460, 3511, 3558, 3594, 3734, 3829, 3859, 3860, 3937, 3986, 4067, 4087, 4216, 4221, 4246, 4299, 4308, 4396, 4449, 4462, 4488, 4529, 4589, 4742, 4771, 4833, 4850, 4912, 4918, 4995, 5104, 5145, 5147, 5185, 5190, 5221, 5332, 5333, 5342, 5430, 5512, 5565, 5601, 5738, 5744, 5854, 5906, 5916, 5963, 5999, 6021, 6040, 6042, 6099, 6180, 6200, 6223, 6235, 6245, 6268, 6335, 6497, 6520, 6543, 6562, 6617, 6648, 6735, 6748, 6805, 6845, 6957, 7001, 7026, 7377, 7385, 7429, 7469, 7486, 7501, 7522, 7524, 7527, 7559, 7560, 7651, 7677, 7700, 7720, 7738, 7772, 7847, 7888, 7920, 8054, 8086, 8159, 8184, 8218, 8220, 8268, 8278, 8379, 8437, 8475, 8564, 8646, 8668, 8729, 8784, 8786, 8819, 8865, 8889, 9032, 9165, 9251, 9302, 9372, 9410, 9469, 9498, 9517, 9521, 9529, 9571, 9589, 9599, 9644, 9701, 9774, 9797, 9994, 10113, 10249, 10267, 10293, 10316, 10420, 10525, 10630, 10638, 10645, 10679, 10680, 10712, 10733, 10747, 10755, 10809, 10878, 10901, 10938, 11003, 11011, 11026, 11061, 11074, 11123, 11138, 11141, 11149, 11166, 11170, 11199, 11320, 11339, 11374, 11385, 11404, 11466, 11624, 11706, 11789, 11840, 11866, 11872, 11978, 12034, 12053, 12082, 12226, 12282, 12347, 12371, 12395, 12462, 12516, 12601, 12647, 12747, 12852, 12866, 12903, 12931, 13001, 13091, 13149, 13183, 13293, 13339, 13402, 13421, 13458, 13471, 13542, 13647, 13691, 13763, 13789, 13804, 13809, 13814, 13820, 13825, 13856, 13858, 13875, 13903, 13930, 13969, 13980, 13988, 13989, 14021, 14027, 14071, 14098, 14105, 14192, 14201, 14213, 14349, 14410, 14453, 14498, 14500, 14523, 14644, 14704, 14720, 14904, 14909, 15074, 15200, 15251, 15311, 15388, 15398, 15605, 15636, 15659, 15673, 15921, 16087, 16113, 16169, 16208, 16255, 16271, 16397, 16433, 16452, 16458, 16522, 16580, 16641, 16646, 16698, 16703, 16716, 16770, 16774, 16803, 16857, 16859, 16871, 16893, 17020, 17023, 17047, 17093, 17254, 17373, 17561, 17586, 17625, 17629, 17747, 17775, 17809, 17834, 17838, 17946, 18049, 18100, 18164, 18227, 18309, 18344, 18356, 18439, 18528, 18602, 18603, 18626, 18646, 18741, 18850, 18856, 18878, 18879, 18902, 19150, 19153, 19156, 19244, 19247, 19301, 19339, 19464, 19522, 19527, 19528, 19668, 19700, 19724, 19802, 19815, 19889, 19907, 19919, 19949, 20002, 20032, 20060, 20075, 20136, 20167, 20173, 20179, 20209, 20223, 20267, 20310, 20340, 20359, 20411, 20424, 20432, 20523, 20527, 20536, 20592, 20667, 20691, 20719, 20747, 20827, 20935, 21075, 21183, 21243, 21270, 21273, 21278, 21289, 21304, 21376, 21422, 21550, 21563, 21673, 21677, 21708, 21720, 21806, 21809, 21861, 21936, 21939, 21966, 22012, 22121, 22134, 22148, 22156, 22157, 22180, 22242, 22460, 22499, 22506, 22577, 22657, 22677, 22680, 22872, 22884, 22945, 22961, 22964, 23053, 23062, 23064))